Evaluating Automatically Acquired F-structures against Propbank
نویسندگان
چکیده
f-structure representations is presented by Burke et al. (2004b). The annotation algorithm is the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2004) and for the induction of subcategorisation frames (O'Donovan et al., 2004; O'Donovan et al., 2005). Annotation quality is, therefore, extremely important and to date has been measured against the DCU 105 and the PARC 700 Dependency Bank (King et al., 2003). The annotation algorithm achieves f-scores of 96.73% for complete f-structures and 94.28% for preds-only f-structures against the DCU 105 and 87.07% against the PARC 700 using the feature set of Kaplan et al. (2004). Burke et al. (2004a) provides detailed analysis of these results. This paper presents an evaluation of the annotation algorithm against PropBank (Kingsbury and Palmer, 2002). PropBank identifies the semantic arguments of each predicate in the Penn-II treebank and annotates their semantic roles. As PropBank was developed independently of any grammar formalism it provides a platform for making more meaningful comparisons between parsing technologies than was previously possible. PropBank also allows a much larger scale evaluation than the smaller DCU 105 and PARC 700 gold standards. In order to perform the evaluation, first, we automatically converted the PropBank annotations into a dependency format. Second, we developed conversion software to produce PropBank-style semantic annotations in dependency format from the f-structures automatically acquired by the annotation algorithm from Penn-II. The evaluation was performed using the evaluation software of Crouch et al. (2002) and Rie-zler et al. (2002). Using the Penn-II Wall Street Journal Section 24 as the development set, currently we achieve an f-score of 76.58% against PropBank for the Section 23 test set.
منابع مشابه
Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations
This paper shows how finite approximations of long distance dependency (LDD) resolution can be obtained automatically for wide-coverage, robust, probabilistic Lexical-Functional Grammar (LFG) resources acquired from treebanks. We extract LFG subcategorisation frames and paths linking LDD reentrancies from f-structures generated automatically for the Penn-II treebank trees and use them in an LDD...
متن کاملA Statistical Generative Model for Unsupervised Learning of Verb Argument Structures
We present a statistical generative model for unsupervised learning of verb argument structures. We use the model in order to automatically induce verb argument structures for a representative set of verbs. Approximately 80% of the induced argument structures are judged correct by human subjects. The structures overlap significantly with those in PropBank; they also exhibit correct patterns of ...
متن کاملBIOSMILE: Adapting Semantic Role Labeling for Biomedical Verbs: An Exponential Model Coupled with Automatically Generated Template Features
In this paper, we construct a biomedical semantic role labeling (SRL) system that can be used to facilitate relation extraction. First, we construct a proposition bank on top of the popular biomedical GENIA treebank following the PropBank annotation scheme. We only annotate the predicate-argument structures (PAS’s) of thirty frequently used biomedical predicates and their corresponding argument...
متن کاملBIOSMILE: Adapting Semantic Role Labeling for Biomedical Verbs
In this paper, we construct a biomedical semantic role labeling (SRL) system that can be used to facilitate relation extraction. First, we construct a proposition bank on top of the popular biomedical GENIA treebank following the PropBank annotation scheme. We only annotate the predicate-argument structures (PAS’s) of thirty frequently used biomedical predicates and their corresponding argument...
متن کاملA Semantic Kernel for Predicate Argument Classification
Automatically deriving semantic structures from text is a challenging task for machine learning. The flat feature representations, usually used in learning models, can only partially describe structured data. This makes difficult the processing of the semantic information that is embedded into parse-trees. In this paper a new kernel for automatic classification of predicate arguments has been d...
متن کامل